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Abstract 

It has been proposed that populations of neurons process infor- 
mation in terms of probability density functions (PDFs) of analog 
variables. Such analog variables range, for example, from target lu- 
minance and depth on the sensory interface to eye position and joint 
angles on the motor output side. The requirement that analog vari- 
ables must be processed leads inevitably to a probabilistic descrip- 
tion, while the limited precision and lifetime of the neuronal process- 
ing units leads naturally to a population representation of information. 
We show how a time-dependent probability density p(x; t) over vari- 
able x, residing in a specified function space of dimension D, may be 
decoded from the neuronal activities in a population as a linear com- 
bination of certain decoding functions 4>i{x), with coefficients given 
by the N firing rates cii(t) (generally with D << N). We show how 
the neuronal encoding process may be described by projecting a set 
of complementary encoding functions 4n{x) on the probability den- 
sity p(x;t), and passing the result through a rectifying nonlinear ac- 
tivation function. We show how both encoders 4>i{x) and decoders 
4>i(x) may be determined by minimizing cost functions that quan- 
tify the inaccuracy of the representation. Expressing a given com- 
putation in terms of manipulation and transformation of probabili- 
ties, we show how this representation leads to a neural circuit that 
can carry out the required computation within a consistent Bayesian 
framework, with the synaptic weights being explicitly generated in 
terms of encoders, decoders, conditional probabilities, and priors. 
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1 Introduction 



It has been hypothesized (Anderson, 1994, 1996) that circuits of corti- 
cal neurons perform statistical inference, and, in particular, that they 
encode and process information about analog variables in the form 
of probability density functions (PDFs). This PDF hypothesis pro- 
vides a unified framework for understanding diverse observations from 
experimental neurobiology, constructing neural network models, and 
gaining insights into how neurons can implement a rich collection of 
information-processing functions. 

The PDF hypothesis derives from two major themes of computa- 
tional neuroscience. The first theme stems from efforts to determine 
how information is represented by neural systems, through understand- 
ing how neural activity correlates to external cues or actions (such as 
sensory stimuli or motor response). Our understanding of neural en- 
coding can be tested by inferring sensory input or motor output from 
a set of neural activities, and comparing the estimate thus obtained to 
the external cue or action. 

To decode the response from a population of neurons requires pro- 
cedures to infer information from individual spike trains, as well as 
procedures to combine these results into an aggregate estimate. An 
optimal method for decoding information from individual neural spike 



trains has been developed (Bialck et al., 1991; Bialek and Rieke, 1992 



Riekc ct al., 1997) and applied to movement-sensitive neurons in the 



blowfly (Rieke et al., 1997) and to other systems (Thcunissen et al. 



1996). This method consists of utilizing a linear filter to extract the 



maximum possible information from each spike (typically a few bits; 
see Rieke et al., 1997), as measured by the ability to reconstruct the 
stimulus from the spike train. In these studies, the linear filter deter- 
mines a firing rate from the spike trains; this firing rate contains most 
of the information, with additional information possibly encoded in 
other aspects of the activity patterns. In the current work, we assume 
that the firing rates capture the essential behavior of neural systems, 
and will not explicitly consider spike trains. 

Methods for decoding information from the firing rates of pop- 
ulations of neurons were pioneered by Georgopoulos and collabora- 
tors. They showed that a "population vector" derived from the fir- 
ing rates of a population of cortical neurons can be used to predict 



the intended arm movements of monkeys (Georgopoulos et al., 1986 
[Schwartz, 1993| ). This vector estimate of direction, V est , is obtained 
from the neural firing rates a% by 



JV 



i=i 

where the preferred direction vectors, Ci, indicate the direction at 
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which neuron i has its maximal firing response. The population vector 
approach has been refined and extended by several authors; in par- 
ticular, Salinas and Abbott (1994) provide an excellent discussion of 
several such refinements, as well as introducing their own. The em- 
phasis in such studies has been the reconstruction of vector quantities 
from populations of neural responses by a process that in several cases 
appears to be computation of an expectation value from an implicit 
probability distribution. 

The second theme leading to the PDF hypothesis stems from an 
analysis showing that the original Hopfield neural network implements, 
in effect, Bayesian inference on analog quantities in terms of PDFs 



( Anderson and Abrahams, 1987 ). The role of PDFs in neural infor- 
mation processing is being explored along a number of avenues. As 
in the present work, Zemel et al. (1998) have investigated population 
coding of probability distributions, but with different representations 
than those we will consider here. Several extensions of this represen- 



tation scheme have been developed (Zemel, 1999; Zemel and Dayan 



1999; Yang and Zemel, 2000) that feature information propagation be- 
tween interacting neural populations. Further, a number of related 
models have been introduced. Of particular note is a dynamic rout- 
ing model of directed attention (Anderson and Van Essen, 1987; 01- 
shausen et at, 1993, 1995). Additionally, several "stochastic machines" 



(Haykin, 1999) have been formulated, including Boltzmann machines 
( Hinton and Sejnowski, 1986|), sigmoid belief netwo rks (Neal, 1992), 



and Hehnholtz machines (Dayan and Hinton, 1996). Stochastic ma- 



chines are built of stochastic neurons that choose one of two possible 
states in a probabilistic manner. Learning rules for stochastic machines 
enable such systems to model the underlying probability distribution 
of a given data set; however, they are not biologically realistic. 

The two prominent themes of population coding and probabilistic 
inference are combined in the PDF hypothesis through the assertion 
that a physical variable x is described by a neural population at time t 
in terms of a PDF p(x; t), rather than as a single- valued estimate x(t). 
Such a PDF description has the significant advantage that it not only 
permits a single-valued estimate to be calculated, but also provides 
for measures of the uncertainty of such estimates. For example, a 
specific value £ at time t can be represented as the mean of a normal 
distribution over x with variance a 2 , so that 

p(x;t) = N(x;S(t),o 2 (t)) (2) 

Clearly, this PDF allows to be known very precisely (small vari- 
ance) or with a great deal of uncertainty (large variance). 

More generally, we consider a PDF described at time t in terms of 
a set of D underlying parameters {^4 M }. Guided by the experimentally 
observed linear decoding rules discussed above, we will take the PDFs 
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to be represented by 



D 



p(x;t) ee p(x;{A^(t)}) = ^^)$ M (x) 



(3) 



n=i 



The basis functions $^(x) are orthonormal functions that define the 
PDFs the neural circuit can represent. We describe x with p(x; {A^t}}) 
rather than p{x \ {AJit)}) to distinguish between the assumed forms 
of models (equation ||) and relationships that exist amongst random 
variables (viz. conditional probabilities). 

The amplitudes A^(t) of the representations defined by equation || 
cannot be interpreted as neuronal firing rates: they can take on nega- 
tive values and are more precise than neuronal firing rates. However, 
we can represent a PDF in terms of decoding functions 4>i (%) and firing 
rates Qi(t) associated with N neurons, so that 



Unlike the basis functions <f> AI (x), the decoding functions 4>i{x) form 
a highly redundant, overcomplete representation [N 3> D) that is 
specialized for use with neurons of limited precision. 

From the relations asserted in equations || and |I| we can identify 
three relevant problem domains. First, we have the physical variable x, 
described by the PDF p(x;t). This domain is that of high-level con- 
cepts. Second, we have the neural network with its measurable neural 
firing rates cii(t). The neural network constitutes a physical imple- 
mentation of the desired computations on the physical variable, so the 
properties of this second domain should be chosen to match the prop- 
erties of biological systems as closely as possible. In particular, the 
neural firing rates must be constrained to be positive quantities of low 
precision. The third domain is that of the underlying parameters A^, 
which subserve an alternative, abstract implementation of the desired 
computations. The constraint in this case is minimality: we concern 
ourselves only with mathematical convenience and allow the A^ to be 
of arbitrary precision and to take on negative values. 

Following Zemel et al. (1998), the domain of physical variables is 
called the implicit space and the domain of measurable quantities the 
explicit space. Extending their nomenclature, we shall refer to the third 
domain as the minimal space. The minimal space will serve as a useful 
bridge between the two other spaces. 

It may be conceptually helpful to regard the variables or param- 
eters A^it) as the activities of a set of D "metaneurons," fictitious 
entities that reside and act in the minimal space. However, it must 
be emphasized that such metaneurons differ from real neurons in their 
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(4) 
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abilities to function with high precision and to produce negative "firing 
rates" A^t). Accordingly, they possess valuable properties that will 
facilitate formal representation and analysis. 



2 Obtaining the Neuronal Representation 

2.1 Multiple Levels of Representation 

The fundamental assumption of the framework to be developed in this 
paper is that information about a physical variable x given a set of pa- 
rameters A = {Ai, A 2 , A 3 , . . .} at time t is represented by an ensemble 
of neurons as a PDF p(x; Ai(t), A 2 (t), A 3 (t), . . .). For notational con- 
venience, we will usually abbreviate this quantity as p(x] t). This PDF 
can be determined from a set of neuronal firing rates {di(t)} using a 
set of decoding functions (or simply decoders) </>i(x), as prescribed in 
equation |]. In turn, a set of encoding functions (encoders) <fii{x) is 
used to determine the firing rates from an assumed PDF by means of 

a i(t) = f ( / <j>i(x)p(x;t)dx I (5) 



where a nonlinear activation function /() is introduced to preclude 
negative firing rates. The encoding functions 4>i{x) must be chosen 
so as to yield a close match to desired (i.e. experimentally observed) 
firing rates di(t). The decoding rule (equation^) should in general be 
viewed as only returning an approximation to the PDF: in particular, 
functions that are not strictly positive semidefinite can be decoded 
from such a rule. 

We can also represent the PDF using a complete orthonormal basis 
{$ M (x)} for the space spanned by the decoders, as shown in equation B. 
Further, we can represent the decoding functions in terms of this basis, 
writing 

D 

<i>i{ x ) = K fi^A x ) ( 6 ) 

!/=l 

where the coupling coefficients to be determined. Since we now 

have an orthonormal basis, the coefficients A^ in equation ^ are simply 
evaluated from 

A^t) = J ^„(x)p{x;t)dx (7) 

The encoding and decoding rules based on the amplitudes A^ (t) in the 
minimal space are seen to parallel those based on the neuronal firing 
rates ai(t), apart from the absence of a nonlinearity in equation [?]. 
In this section, we will develop methods to relate operations in the 
mathematically convenient minimal space and the biologically plausi- 
ble implementation of PDFs in the explicit space of model neurons. 
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2.2 Obtaining the Encoding Functions 



Although we do not know the encoding functions at this point, we do 
know that they can be represented in terms of another set of basis 
functions through 



(8) 



where the coupling coefficients kj^ are in general distinct from the 
n v i. For many networks, it is appropriate to assume the basis for the 
encoders to be identical to the basis for the decoders. For example, 
in the case of the neural integrator (see section |2.5| ) the PDFs are 
continually mapped into and out of the minimal space provided by the 

&f_i(x) and &v(x). Thus, span{$ M } can be equal to span|<i>„j. For 

definiteness, we take = &(i(x). 

To find the encoding functions, we define the cost function 



a*(A) - / ( / 4>i{x)p(x; A)dx 



p(A)dA 



= a <( A )- / (jO*™ J ®v(x)p(x;A)dxJ p{A)dA 

(9) 

We now use gradient descent to determine the ki V that minimize E\ 

' (a J (A)-f(h 3 (A)))f'(h J (A))U,(A)p(A)dA 

(10) 

U V (A) = J $ u (x)p(x;A)dx (11) 
hj(A) = kj V U v {A) (12) 



dt 



-v 



where rj is a rate constant. We have defined 



to simplify the expression. 

To verify the efficacy of this optimization procedure, we apply it to 
a set of broadly tuned, biologically reasonable neuronal responses to 
a precise input signal. In particular, wc use picccwisc- linear activities 
(Figure |l|), essentially one-dimensional versions of the response func- 
tions entering Georgopoulos's population vector, to define our neu- 
ral responses over the interval [—1,1] (see also Figure 4 in Fuchs et 
al, 1988). We assume a minimal space spanned by two straight-line 
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Figure 1: Broad piecewise- linear functions provide biologically plausible neu- 
ral firing rates. These firing profiles are similar to one-dimensional versions 
of the neural responses used to construct Georgopoulos's population vector. 
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Figure 2: Encoders and decoders obtained using the optimization pro- 



cedures of sections 2.2 and |2.3| . (a) An orthonormal basis for the minimal 
space underlying the piecewise-planar firing rates. Any line segment over the 
interval [—1,1] can be expressed as a linear combination of these two basis 
functions, (b) Encoders found using the optimization procedure. For each 
neuron, the encoder has a slope identical to that of the firing-rate profile, 
(c) Decoders obtained in the absence of noise require neurons of extreme 
precision to operate properly. Two decoders significantly contribute to the 
decoded PDFs, while all others are zero, contributing nothing to the decoded 
PDF. (d) Decoders obtained assuming a small amount of noise depend upon 
all of the neurons. The decoders shown here result from a noise variance 
of 0.01, limiting the precision of the neurons to biologically plausible levels. 
Note that these decoders can take on negative values, so the functions re- 
constructed from the neural firing rates may only approximate the encoded 
PDF. 



8 



functions, shown in Figure ya, and take the activation function to be 
rectification 

Since we are interested in representing a precise input, we choose 
p(x;t) — S(x — Applying the optimization procedure, we ob- 

tain a set of encoders (Figure ||b) that are able to exactly reconstruct 
the neural activity patterns with input PDFs of the assumed Dirac 
delta function form. 



2.3 Obtaining the Decoding Functions 

A similar procedure is used to find the decoding functions. We first 
account for the limited precision of neural firing rates and for any 
intrinsic noise of real neurons by converting the neural firing rates into 
stochastic processes 

Oi(A) -> Oj(A) +£i (14) 

where e% represents the noise source. We assume £j to have zero mean 
without loss of generality; a non-zero mean can be absorbed into the 
firing rate profiles, if needed. The above encoding functions are un- 
changed by the presence of zero-mean noise. 

To ensure that the encoders and decoders found are not dependent 
on a particular realization of the noise, we define the cost function 



UjJ (^;A)-f>( 



E 2 = ^{ II \p(x;A)-y](a i (A)+e i )4> i {x) S j p(A)dx<iA^ 

Here, the angle brackets indicate an ensemble average over realizations 
of the neuronal noise. Substituting equation into E2, we have 




p(x; A) - V $ v (x)K vi (ai(A) + £i) p{A)dxdA 



i.v 



(16) 

To find the n V i that minimize this cost function, we calculate 8E2 /dn^j . 
Taking each e% to be independent, identically distributed, zero-mean 
Gaussian noise with variance a 2 produces 



where 



= - Mvj + £ (r« + a 2 ) (17) 



M vj = J J p(x;A)a j (A)^ u (x)p(A)dxdA (18) 
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and 



Tij = J a i {A)a j (A)p(A)dA (19) 



in matrix 



Setting the derivatives to zero and recasting equation 19 
form, we have 

(r + a 2 l) k = M (20) 

We can solve directly for k by inverting (r + a 2 /). 

The inclusion of noise is essential for producing sensible decoders. 
To illustrate this fact, we determine decoders for neurons with piecewise- 
linear activity pat tern s employing the basis shown in Figure ||a (dis- 



cussed in section 2.2). The decoders are used to attempt a recon- 
struction of the original delta- function PDFs, inverting the encoding 
process previously considered. With a 2 = 0, the algorithm produces 
two decoders that play a significant role while the others are all zero 
(Figure ||c). This noise- free solution evidently requires neurons that 
are extremely precise in their firing rates, rather than making use of 
redundant neurons to improve the quality of the representation. With 
noise present (a 2 > 0), we determine a set of decoders that utilizes all 
of the neurons in the representation (Figure ||d) and is independent of 
unrealistically precise firing rates. 

Having determined the decoders, we can directly transform between 
the explicit, implicit, and minimal spaces. The transformation rules 
are summarized pictorially in Figure |[ 

2.4 Dimensionality of the Minimal Space 

The structure of the neural representations created depends critically 
upon the dimensionality D of the associated spaces. We can most 
easily explore the effect of the dimensionality in the minimal space, 
where D is simply equal to the number of basis functions (x) . 

By way of illustration, let us pattern the basis functions after 
the Legendre polynomials P fJ- (x). The Legendre polynomials form 
an orthogonal set, but are not normalized, so we define P^ix) — 

P^{ x )l\j §-iP 2 { x )dx over the interval [—1,1]. For dimension D, wc 
then set the minimal-space basis function &^(x) equal to the normal- 
ized Legendre polynomial P^-iix) for /i = 1, 2, . . .D. 

To demonstrate the effect of the dimension D upon the quality of 
the neural representation, we compare an assumed target PDF with the 
PDF as represented in neural populations. We vary D and generate, 



as described in sections |2.2| and 2.3, encoding and decoding functions 



optimized to work with neurons with firing rate profiles as shown in 
Figure |[ Using equation ||, the target PDF is encoded into neural 
firing rates which are then decoded using equation [l| 
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explicit 
space 




implicit ■< minimal 

space > space 

A^t) = J^ ll (x)p(x;t)dx 

Figure 3: Transformations between the representations. With the indicated 
rules, we can readily switch between the implicit, explicit, and minimal 
spaces, associated respectively with the variables x, ai (i = 1, 2, 3, . . . , TV), 
and Afj, (fi = 1, 2, 3, . . . , D), and select the most convenient one for any given 
task. 
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Figure 4: The effect of the dimensionality D on the quality of the neural 
representation. As D is increased, the decoded PDF more closely matches 
the original PDF. 



With a bimodal target PDF, increasing D improves the quality of 
the decoded PDF (Figure^). For D — 2, only a stra ight line is decoded 



(although this may still be useful — see sections 2^5 and 3^), while for 



D = 8, the decoded PDF matches the target PDF quite well. 
2.5 A Neural Integrator Model 

An important example of a neural integrator is the group of neurons 
that maintain the eyes in a fixed position in the absence of visual 
input. These recurrently connected neurons are able to hold the eye 
in position for times much longer than the interspike interval of the 
neurons. Collectively, they form an attractor network that acts as a 



memory of eye position which lasts for several seconds ( Seung, f996 ). 

By introducing temporal dynamics into the underlying probabilistic 
models, we can create a model of a neural integrator. The dynamics 
are straightforward: for a short time t, the PDF should be unchanged, 
so 

p[x;t + T) = p[x;t) (21) 
where x is the value (i.e. eye position) stored in the memory. 
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As discussed above, we generate decoding functions using piecewise- 
linear activities, linear encoders, and a rectifying activation function. 
Making use of this representation, the encoding and decoding rules 
(equations |4| and |^), and the probabilistic dynamics (equation pl|) , we 
can show that 



ai (t + T) = g[J<j>i{x)p{x;t + T)dx\ (22) 
= 9 feoj(*)y" j (23) 

Defining weights 

Wjj = / ' 4>i(x)<j)j(x)dx (24) 

we may rewrite this as 



a i (t + r)=g\y^ ij a j (t)\ (25) 



The recurrent neural network that results is fully connected, with each 
neuron having a synaptic connection to every other neuron. 

The stored value of the eye position is extracted by calculating 
the expectation value of the random variable x, weighted by the de- 
coded PDF. Ideally, we would like any value in the supported range 
to be held constant, so that the network functions as a line attractor 
( Seung, 1996| ), a kind of continuous attractor. However, the system 




actually operates as a collection of point attractors with only a lim- 
ited number of stable fixed points, as can be seen from the network's 
transfer function (Figure ||). The structure of the transfer function, 
and the number of stable fixed points, depends on the dimensional- 
ity D of the minimal space. As the dimensionality of the minimal 
space is increased, the neural integrator can support additional stable 
fixed points, eventually approximating a line attractor. This neural 
integrator model is essentially a variation of the model constructed by 
Eliasmith and Anderson (1999). 



3 Probabilistic Inference Performed by Neu- 
ral Networks 

3.1 Inference 

Inference between two related variables x and y in the implicit space is 
performed by taking a weighted average of the conditional probability 
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-1 -0.5 0.5 1 

<x>(t) 

Figure 5: The neural integrator model maintains only a limited number of 
values, rather than an arbitrary input value. The number of stable fixed 
points of the neural integrator model can be seen in the network's transfer 
function. Here there are two stable fixed points for a neural integrator 
consisting of 20 neurons with encoders and decoders found using a minimal 
space with dimension D = 2. By increasing D to 4, the number of stable 
fixed points increases to 3 (not shown), while increasing D to 6 yields 4 
stable fixed points. With only the 20 neurons of limited precision utilized 
here, further increases in D do not give rise to further increases in the 
number of stable fixed points. 
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p{y\x): 

P(r,t + r) = I p{y\x)p{x;t)dx (26) 



We have assumed in equation |2(j that the relationship between x and 
y is independent of the values of the minimal parameters, so 

p(y\x;{A^t)}) = p(y\x) (27) 

This assumption fixes the structure of the probabilistic model, ex- 
plicitly excluding learning from any neural networks derived from it. 
The conditional probability p(y \ x) is like a fixed look-up-table; the 
Marr-Albus theory of cerebellar function can be directly mapped into 



equation |26| ( [Hakimian et al., 1999|) 



Mapping the implicit-space inference relation |26| into the explicit 
space of neurons yields a neural network (Anderson, 1994, 1996; Zemel 
and Dayan, 1997). Specifically, one imposes representations as given 
in equations |^ and ^ for x, and 

p(y;t) = £>(t)^(y) ( 28 ) 



bj(t) = .9 (J i>i{v)p{y;t)dy\ (29) 

for y. Then one combines these representations with equation ^6[ lead- 
ing to 

b j (t + T)=g^2w ji a i (t)J (30) 
with the coupling coefficients 

Wji = ^j(y)p{y\x)(j>i{x)dxdy (31) 



For well-chosen encoding and decoding functions, equations ^ and ^ 
allow us to construct a neural network that embodies the desired re- 
lationship between the implicit variables, without applying a training 
procedure to find a relation from a data set. 

This approach to inference is naturally extended to greater numbers 
of implicit variables. For example, suppose we add a second input z 
to the above network, and write 



p(y; t + T ^ = JJ P(y\ x > z )p( x 'i t)dxdz (32) 

Representing z using 

p(z-t) = £)c fc (i)0 fc (z) (33) 

k 

c k (t) = f( [ e k (z)p(z;t)dx] (34) 
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leads to 



with 



') = 9 (^2 w i lk 



ai(t)c k (t) 



Wjik 



4>j (y)p(y I z)(f>i (x)9 k (z)dxdydz 



(35) 



(36) 



An interesting feature of this neural network is that it employs multi- 
plicative interactions. This multiplication might be realized by coinci- 
dence detection in the dendrites; the implication is that the dendrites 
are active processing elements (Mel, 1994; Cash and Yustc, 1998). 



3.2 A Communication Channel Model 

As a concrete example of probabilistic inference within the PDF scheme, 
we now use equations [50] and |3l] to implement a communication chan- 
nel. Specifically, we wish to encode a single input value £(£) into a PDF 
p(x; t) represented by a population of neurons, and copy that PDF into 
another PDF p(y; t) represented by a second population of neurons. To 
extract a unique output value from p(y;t), we focus on the expecta- 
tion value of y. We use 20 neurons to represent the input PDF p(x; t) 
and 16 neurons to represent the output PDF p(y;t). The encoders 
and decoders for these neurons are generated from two straight-line 
basis functions (Figure ^|a) and piecewise-linear neural responses as 



explained previously (sections 2.2 and 2.3). 

Since we only want to encode a single value, and not a complex 
multimodal distribution, we describe the input using a PDF of the 
form p(x;t) = S(x — £(£)). We set the form of the conditional PDF 
to be p{y | x) = 5{y — x); accordingly, in the implicit space, we expect 
that p(y; t) — 6(y — £(£)). However, a PDF with such a delta-function 
form is quite intractable in the explicit space — no finite linear combi- 
nation of functions can yield the expected form of p{y;t). Our goal 
is thus to obtain an accurate estimate of £(t), rather than a perfect 
reconstruction of the PDFs. 

To interpret the performance of the neural network, we compare the 
expectation value (y) (weighted by the PDF decoded from the network 
outputs {bj(t)}) to the input £. The decoded PDF is a weighted sum 
of linear decoding functions, and is thus a straight line itself. This is of 
course a poor reproduction of the Dirac delta function input, but (y) 
is closely in accord with the input values (fig ^) . We may understand 
this by considering the basis functions used: they are well-suited for 
calculating the 0th and 1st moments of the PDF, but unsuitable for 
calculating higher-order moments. 
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Figure 6: Although the PDF is not accurately represented, the mean value 
of the PDF can be satisfactorily retrieved from the neural network. By 
using only two basis functions to generate the decoders, the output PDFs 
are elements of a space of dimension two. This is suitable for representing 
the total weight and the mean of a PDF, but not higher moments. 



17 



3.3 Working in the Minimal Space 

So far, we have used the concept of the minimal space as a tool for 
developing the encoders and decoders. However, we also can make di- 
rect use of the minimal space to set up abstract networks, then convert 
those into networks of real neurons. To accomplish this, we derive rela- 
tions between the firing rates in the two spaces {{A^t)} and {asj(t)}). 
The neural network in the explicit space then constitutes a physical 
implementation of the abstract network in the minimal space. The 
issues of the role of neuronal firing rate variability in the population 
code (see for example Abbott and Dayan, 1999) may thus be separated 
from the issues of the propagation of probabilistic information. 

First, consider the decoding rules given by equations || and |j. Mak- 
ing use of equation ^, we obtain 



Since the ^^(x) are orthonormal functions, we have 



for transforming from the explicit space to the minimal space. 

Next, consider the encoding rule given by equation 0. Recalling 
that 4>i(x) = Ki V <& v (x) and A v (t) = J <^ v (x)p(x;t)dx, we have 



<*»(*) = / (Y k ivAv(t)\ 



(39) 



for transforming from the minimal space to the explicit space. 

Using equations |38| and |3^, we can translate between the minimal 
and explicit spaces. This allows us to set up neural networks by first 
working in the mathematically convenient minimal space. To illustrate 
this procedure, we return to the X — ► Y inference network. We take 
the minimal spaces for both the input x and the output y to be defined 
by linear functions over the interval [—1, 1], with basis functions of the 
form shown in Figure |^a. The associated PDFs are represented using 
equation ^ and 

With these representations, the probabilistic relation given in equa- 
tion |6| becomes 

B v (t + r) =J2 n ^Mt) (41) 
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where 

0, M = / V v {y)p(y\x)<bp{x)dxdy (42) 

We next convert this into a neural network in the explicit space using 
equations [38] and |3^, so that 



By identifying 

^ = £^0^ (44) 
we may rewrite equation ^3] as 

bj{t + T)=gfcujiai(t)J (45) 

arriving at a neural network with the same feedforward dynamics 
(equation and the same synaptic weights (equation ||l]) found pre- 
viously. 

This example reproduces results previously found by working in the 
explicit space, but also highlights several advantages of working in the 
minimal space. Perhaps most importantly, the fundamental structure 
of the neural networks is made more transparent by eliminating the 
redundancies that arise in the networks due to the limited represen- 
tational ability of neurons. Significantly, we see that computational 
properties of the nonlinear update rule for the output neurons (equa- 
tion fi"3"| ) can be understood by studying the linear update rule in the 
minimal space (equation ^), consistent with the population vector 
representations investigated by Georgopoulos et al. (1986). 



4 Conclusions 

We have examined some of the ramifications of the hypothesis that 
neural networks represent information as probability density functions. 
These PDFs are assumed to be expressible a linear combination of some 
implicit decoding functions, with the decoder for each neuron being 
weighted by its firing rate. The firing rates in turn may be obtained 
from a PDF using a complementary set of encoding functions. 

In general, the encoding and decoding functions that we have intro- 
duced are numerous enough to define spaces of very high dimension, 
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far beyond the range of accurate representation by biological neurons 
having a precision of only a few bits. To mediate this conflict be- 
tween computational requirements and biological reality, we have in- 
troduced an auxiliary representation of a lower-dimensional minimal 
space appropriate to the nature and scale of the computations that 
neurobiological systems actually perform on the relevant input and 
output analog variables. The basis functions in this minimal space are 
used to represent both the encoding and decoding functions, limiting 
the dimensionality of the spaces they define. As an added benefit, the 
minimal space — and the associated metaneuron variables — can be cho- 
sen to have properties that facilitate theoretical characterization of the 
neural networks resulting from the PDF hypothesis. 

These neural networks are based upon the available probabilis- 
tic information and upon the encoding and decoding functions. The 
synaptic weights of the networks are fully specified without a train- 
ing procedure. A natural extension of the work we have presented is 
the addition of learning rules for determining the weights. Learning 
rules would provide several advantages; in particular, they would fa- 
cilitate the generation of neural networks when data is available but 
the underlying computations are not entirely clear. The optimization 
procedure we utilized to find the encoding functions may be a useful 
starting point for identifying a more complete learning rule. 

Researchers in the fields of molecular biology, immunology, genetics, 
development, and evolution, all of which involve highly complex sys- 
tems having many degrees of freedom, are beginning to explore the use 
of "metavariables" as a formal means to reduce the dimensionality of 
the space of parameters that must be dealt with in achieving viable and 
tractable quantitative descriptions. The formal results we have derived 
for metavariable ( "metaneuron" ) representation of function spaces and 
the experience we have gained through associated model simulations 
may prove valuable for parallel investigations in these and other fields. 

Returning to the neurobiological context, we may comment on the 
the role that is envisioned for the PDF formalism in the modeling of 
brain function. Recent work based on population-temporal coding (e.g. 
Eliasmith and Anderson 1999, 2002) indicates that the modeling of low- 
level sensory processing and output motor control do not require such a 
sophisticated representation; manipulation of mean values is generally 
sufficient and the representations can be simplified to deal with vector 
spaces instead of function spaces. However, explicit representation of 
probabilistic descriptors of the state of knowledge of pertinent analog 
variables may prove indispensible to an understanding of higher-level 
processes. For example, estimates of depth at each spatial location 
from the disparity between the images impinging on both eyes can 
never be made with precision using a purely bottom-up strategy. 

The modern approach to all higher-level image-processing tasks is 
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driven by the theory of Bayesian inference, in which models are de- 
veloped and parameters estimated based on a set of well-defined rules 



within a probabilistic framework. In a second paper (Barber et al. 



2001), we carry the PDF program a step further by formulating pro- 



cedures for embedding joint probabilities into neural networks. These 
procedures allow us to design neural circuit models that pool multiple 
sources of evidence. In our view, this offers the most rational ap- 
proach to building and understanding cortical circuits that carry out 
well-posed information-processing tasks. 
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